Toward sensitive document release with privacy guarantees
نویسندگان
چکیده
Privacy has become a serious concern for modern Information Societies. The sensitive nature of much of the data that are daily exchanged or released to untrusted parties requires that responsible organizations undertake appropriate privacy protection measures. Nowadays, much of these data are texts (e.g., emails, messages posted in social media, healthcare outcomes, etc.) that, because of their unstructured and semantic nature, constitute a challenge for automatic data protection methods. In fact, textual documents are usually protected manually, in a process known as document redaction or sanitization. To do so, human experts identify sensitive terms (i.e., terms that may reveal identities and/or confidential information) and protect them accordingly (e.g., via removal or, preferably, generalization). To relieve experts from this burdensome task, in a previous work we introduced the theoretical basis of C-sanitization, an inherently semantic privacy model that provides the basis to the development of automatic document redaction/sanitization algorithms and offers clear and a priori privacy guarantees on data protection; even though its potential benefits C-sanitization still presents some limitations when applied to practice (mainly regarding flexibility, efficiency and accuracy). In this paper, we propose a new more flexible model, named (C, g(C))-sanitization, which enables an intuitive configuration of the trade-off between the desired level of protection (i.e., controlled information disclosure) and the preservation of the utility of the protected data (i.e., amount of semantics to be preserved). Moreover, we also present a set of technical solutions and algorithms that provide an efficient and scalable implementation of the model and improve its practical accuracy, as we also illustrate through empirical experiments.
منابع مشابه
Differentially Private Local Electricity Markets
Privacy-preserving electricity markets have a key role in steering customers towards participation in local electricity markets by guarantying to protect their sensitive information. Moreover, these markets make it possible to statically release and share the market outputs for social good. This paper aims to design a market for local energy communities by implementing Differential Privacy (DP)...
متن کاملData masking for privacy-sensitive learning
We study the problem of data release with privacy, where data is made available with privacy guarantees while keeping the usability of the data as high as possible. This is important in healthcare and other domains with sensitive data. In particular, we propose a method of masking sensitive parts of private data while ensuring that a learner trained using the masked data is similar to the learn...
متن کاملمقایسه ی ُمیزان رعایت اصول محرمانگی در موارد قانونی بر مبنای راهنمای سازمان بهداشت جهانی دربیمارستان های آموزشی وابسته به دانشگاه های علوم پزشکی ایران ،تهران و شهیدبهشتی :1387.
Introduction: In many countries, the medical records are important legal documents, essential not only for the present and future care for patients but also as legal documents to protect the patients and the hospitals. Medical record is a confidential document and always the patient's right to privacy must be regarded. Methods: This is a descriptive - cross sectional study. Study sample were 34...
متن کاملMore Flexible Differential Privacy: The Application of Piecewise Mixture Distributions in Query Release
There is an increasing demand to make data “open” to third parties, as data sharing has great benefits in datadriven decision making. However, with a wide variety of sensitive data collected, protecting privacy of individuals, communities and organizations, is an essential factor in making data “open”. The approaches currently adopted by industry in releasing private data are often ad hoc and p...
متن کاملC-sanitized: a privacy model for document redaction and sanitization
Within the current context of Information Societies, large amounts of information are daily exchanged and/or released. The sensitive nature of much of this information causes a serious privacy threat when documents are uncontrollably made available to untrusted third parties. In such cases, appropriate data protection measures should be undertaken by the responsible organization, especially und...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Eng. Appl. of AI
دوره 59 شماره
صفحات -
تاریخ انتشار 2017